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ABSTRACT 

The extraction of cosmological parameters from microwave background observations 
relies on specific assumptions about the statistical properties of the data, in particular 
that the p-point distributions of temperature fluctuations are jointly-normal. Using a 
battery of statistical tests, we assess the multivariate Gaussian nature of the Wilkin- 
son Microwave Anisotropy Probe (WMAP) 1st year data. The statistics we use fall 
into three classes which test different aspects of joint-normality: the first set assess the 
normality of marginal (one-point) distributions using familiar univariate methods; the 
second involves statistics that directly assess joint-normality; and the third explores 
the evidence of non-linearity in the relationship between variates. We applied these 
tests to frequency maps, 'foreground-cleaned' assembly maps and all-sky CMB-only 
maps. The assembly maps are of particular interest as when combined with the kp2 
mask, we recreate the region used in the computation of the angular power spectrum. 
Significant departures from normality were found in all the maps. In particular, the 
kurtosis coefficient, D'Agostino's statistic and bivariate kurtosis calculated from tem- 
perature pairs extracted from all the assembly maps were found to be non-normal 
at 99% confidence level. We found that the results were unaffected by the size of the 
Galactic cut and were evident on either hemisphere of the CMB sky. The latter sug- 
gests that the non-Gaussianity is not simply related to previous claims of north-south 
asymmetry or localized abnormalities detected through wavelet techniques. 

Key words: cosmic microwave background cosmology: theory methods: statistical 



1 INTRODUCTION 

Our picture of the Universe has evolved remarkably over the 
past century; from a view limited to our Galaxy to a cosmos 
with billions of similar structures. Today's era of "precision 
cosmology", cosmology appears to be concerned with im- 
proving estimates of parameters that describe the standard 
cosmological model rather than testing for possible alter- 
natives. Recent observations of the large scale structure (eg. 
2dFGRS; Percival et al. 2001) and the Wilkinson Microwave 
Anisotropy Probe (WMAP; Bennett et al. 2003a) observa- 
tions of the Cosmic Microwave Background (CMB) appear 
to confirm our key ideas on structure formation. However, 
there is the suggestion that our confidence may be mis- 
placed. The WMAP observations, for example, have thrown 
up a number of unusual features (Chiang et al. 2003; Efs- 
tathiou 2003; Naselsky, Doroshkevich & Verkhodanov 2003; 
Chiang & Naselsky 2004; Coles et al. 2004; Dineen & Coles 
2004; Eriksen et al. 2004; Larson & Wandelt 2004; Park 
2004; Land & Magueijo 2005a; Medeiros & Contaldi 2005). 
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In particular, it has been noted that the lowest spherical 
harmonics behave in a peculiar way, with an anomalously 
low quadrupole (see Efstathiou 2003, 2004) and confusion 
over the planar nature and alignment of the quadrupole and 
octopole (Tegmark, de Oliveira-Costa & Hamilton 2003; de 
Oliveira-Costa et al. 2004). Multipole vectors have also been 
used to show that there is preferred direction in the CMB sky 
(Copi, Huterer & Starkman 2004; Land k, Magueijo 2005b) . 
Wavelet techniques have demonstrated that particular re- 
gions in the sky display unusual characteristics (eg. Vielva 
et al. 2004; McEwen et al. 2005). Other statistical analyses 
give results that are compatible with Gaussianity (Colley & 
Gott 2003; Komatsu et al. 2003; Stannard & Coles 2005). 
There is some danger here of a "publication bias" in which 
only positive detections are reported so and it appears a 
good time to stand back from the established paradigm and 
systematically test the core assumptions intrinsic to our view 
of the universe. 

Standard cosmological models assume the structures 
we see today grew from small initial perturbations through 
gravitational instability. The initial perturbations are 
thought to be seeded during a period of inflation (Guth 1981; 



© 0000 RAS 



2 P. Dmeen & P. Coles 



Albrecht & Steinhardt 1982; Linde 1982). These primordial 
perturbations are the result of amplification of zero-point 
quantum fluctuations to classical scales during the inflation- 
ary period (Guth & Pi 1982; Hawking 1982; Starobinsky 
1982). In most inflationary scenarios the resultant primor- 
dial density field is taken to be a statistically homogeneous 
and isotropic Gaussian random field (Adler 1981; Bardeen 
et al. 1986). Such a field has both physical and mathemat- 
ical attractions. However, there are alternative possibilities 
that lead to a field with non-Gaussian statistics. Versions 
of inflation involving multiple scalar fields or those with 
non-vacuum initial states lead to a non-Gaussian spectrum. 
Other potential candidates for seeding the primordial field 
also lead to non-Gaussian possibilities. Topological defects 
arise from symmetries being broken as the early Universe 
cools. The topological defect scenario includes models based 
on cosmic strings, textures and monopoles (Kibble 1976; 
Vilenkin & Shellard 1994). The development of the defects 
involves nonlinear physics which leads to a non-Gaussian 
pattern of fluctuations. 

The question of non-Gaussianity in the CMB is inter- 
twined with the same question concerning the initial density 
perturbations. The inhomogeneity in the matter distribution 
is translated to the background radiation photons through 
Thomson scattering. After last scattering the majority of 
these photons arrive at our observatories unperturbed. If the 
primordial matter distribution is Gaussian then so too is the 
primary temperature pattern. Thus, the CMB anisotropies 
provides us with a clean tool to discriminate between ini- 
tial field models described in the previous section. However, 
before we can test this assumption we need to ensure we 
are looking at the last-scattering surface. Most secondary 
anisotropies are nonlinear in nature and hence lead to non- 
Gaussian signals. Lensing and the Sunyaev-Zel'dovich ef- 
fect should both be able to be isolated through their non- 
Gaussian effects. Moreover, both Galactic foregrounds and 
experimental systematic errors will leave non-Gaussian im- 
prints in the CMB measurements. Thus, the development 
of non-Gaussian statistics that can isolate these effects is 
imperative. The demand for such tracers will only intensify: 
increased sensitivity and polarization measurements will re- 
quire better control of secondary effects and foregrounds. 

One of the primary goals of the WMAP mission is 
to make precise estimates of the cosmological parameters. 
These parameters are extracted by comparing the measured 
angular power spectrum to cosmological model predictions. 
The statistical information of the (processed) temperature 
field is assumed to be entirely encoded in the power spec- 
trum, which is true if the field is a multivariate Gaussian. 
It is the aim of this paper to investigate this assumption: if 
it does not hold we may have to question the inferences ob- 
tained from the spectrum and address the causes of this non- 
normality whether cosmological, Galactic or systematic. 

In this paper we approach this issue from a general 
statistical standpoint. Rather than looking for specific non- 
Gaussian alternatives we simply explore the extent to which 
the assumption of multivariate Gaussianity is supported by 
the data using the most general approaches available. The 
layout of the paper is as follows. In the next section we clar- 
ify the definition of a multivariate Gaussian random field. 
In Section 3, we outline techniques we shall use to probe the 
normality of current CMB data sets from WMAP. In Sec- 



tion 4, we discuss the nature of the CMB data sets that the 
techniques are applied to. The details of the implementa- 
tion of the method are drawn in 5. In Section 6, we present 
the results of our analysis. The conclusions are discussed in 
Section 7. 



2 GAUSSIAN RANDOM FIELDS 

Gaussian random fields arise in many situations that require 
the modelling of stochastic spatial fluctuations. There are 
two reasons for their widespread use. One is that the Cen- 
tral Limit Theorem requires that, under very weak assump- 
tions, that linear additive processes tend to possess Gaussian 
statistics. To put this another way, the normal distribution 
is the least "special" way of modelling statistical fluctua- 
tions. The other reason is that Gaussian processes are fully 
specified in a mathematical sense. Very few non-Gaussian 
processes are as tractable analytically. The assumption of 
multivariate Gaussianity is consequently the simplest start- 
ing point for most statistical analyses. 

If we signify the fluctuations in a Gaussian random field 
by <5(r) = S, then the probability distribution function of S 
at individual spatial positions is given by 

^=(2^ eXP ('&) (1) 

where a 2 is the variance and S is assumed to have a mean 
of zero. In fact, the formal definition of a Gaussian random 
field requires all the finite-dimensional p-variate joint distri- 
bution of a set of 8(ri) = 5i to have the form of a multivariate 
Gaussian distribution: 

P(A) = L_Lexp(--A'.r- a), (2 ) 

where A = (<5i, 62, ■ ■ ■ ,6 P ) and £ is the covariance matrix 
(Krzanowski 2000). Therefore, knowledge of the mean and 
variance of each variate, and all covariances between pairs 
of variates, fully specifies the field. 

So how does this formalism manifest itself when we 
are dealing with the CMB? The temperature fluctuations 
AT(6, <f>) in the CMB at any point in the celestial sphere 
should be drawn from a multivariate Gaussian. That is to 
say, V(ATi, AT2, . . . , AT P ) should have the same form of 
Equation 2. The CMB fluctuations can also be expressed in 
spherical harmonics as 

00 m — + 1 

AT(e,4>)=Y, a im Y em (e,<f>), (3) 

1=1 m=—l 

where the ai m are complex and can be written 

aim = \atm\ exp[iipe m ]. (4) 

If the temperature field is a multivariate Gaussian then the 
real and imaginary parts of the ai m should be mutually in- 
dependent and Gaussian distributed (Bardeen et al. 1986). 
Furthermore, the phases ipi m should be random. In our anal- 
yses we look for non-Gaussianity in the WMAP data in both 
real and harmonic spaces. Both spaces allows us to probe a 
range of length scales. Harmonic space has the advantage of 
being more condensed. Whereas, the advantage of studying 
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in real space is that we can easily navigate contaminated re- 
gions or focus on a specific area in the CMB sky. The exact 
nature of the data used is fully described in Section 4. 



3 MULTIVARIATE ANALYSIS 

In this Section, we outline the procedures used later to exam- 
ine whether the WMAP data is strictly multivariate Gaus- 
sian. Generally, when looking for signs of non-normality, it 
helps to have an idea of the form it should take. The wide 
variety of possible sources of non-Gaussianity means that 
there is no unique form of alternative distribution to seek. 
Fortunately, this is quite a common problem in multivariate 
analyses, where real data does not adhere to any specific al- 
ternative model. This is unsurprising as there are very alter- 
native models for which the entire hierarchy of multivariate 
distributions is fully specified. For this reason statisticians 
tend to apply not just one test statistic to the data, but 
a battery of complementaty procedures. The assorted pro- 
cedures will have differing sensitivities to the shape of the 
distribution. There is also a need to augment these tests 
with analyses of subsets of the data. Testing the form Equa- 
tion 2 in all its generality is clearly impossible as it requires 
an infinite number of tests. In practice, various simplified 
approaches tend to be implemented: in particular, the one- 
point marginal distributions of the variates are often stud- 
ied. Marginal normality does not imply joint normality, al- 
though the lack of multivariate normality is often reflected in 
the marginal distributions. A further advantage of examin- 
ing the marginal distributions is that they are computation- 
ally less intense and more intuitive (i.e. easier to interpret) 
and thus more instructive. 

The procedures we apply to the data are described in 
three subsections. In subsection 3.1, we outline univariate 
techniques for assessing marginal normality. In subsection 
3.2, multivariate techniques for evaluating joint normality 
are sketched out. Lastly, we illustrate a procedure that eval- 
uates the degree to which the regression of each variatc 
on all others is linear in subsection 3.3. Throughout these 
subsections, we shall denote the members of the ith vari- 
ate, Xi, by Xij where j — 1, . . . , n. In subsection 3.2, we 
shall use the notation Xj = (xij, x%j, ■ ■ ■ , x P j)' and similarly 
Xfe = (xik, X2k, ■ ■ ■ , x P k)' ■ We wish to emphasize the distinc- 
tion between Xj and X; = (xn,Xi2, . . . , x in )' . 



3.1 Evaluating marginal normality 

The evaluation of marginal normality of the data is based 
on well-known tests of univariate normality. The marginal 
distributions we study correspond to the distribution for 
each individual variate. This is simply the distribution of 
members of the specified variate, ignoring all other mem- 
bers from the data-set. For example, in the bivariate case, 
the marginal probability distribution of the variate xi is 
given by 



P(xi) 



= f 

J — C 



P(xi,x 2 )dx 2 . 



(5) 



The parallel expression for the marginal distributions corre- 
sponding to larger values of the dimensionality p can easily 



be developed. We outline four techniques that probe uni- 
variate normality of V(~x.i): the skewness and kurtosis co- 
efficients; D'Agostino's omnibus test; and a shifted-power 
transformation test. In this subsection, we shall suppress 
the i indices when referring to the data-members such that 
Xj will refer to the individual members of x». 

The classic tests of normality is by means of evaluating 
the sample skewness \fbl and kurtosis b 2 coefficients. If we 
let the sample mean and the sample variance be x and S 2 
respectively, and define 



n ^ 

j'=i 



(6) 



to be the r th moment about the mean. Then the skewness 
is given by 



rr m 3 
1 = S3' 

and the kurtosis by 

m 4 

& 2 = ^. 



(7) 
(8) 



The expected value of the skewness (Mardia 1980) is 
E(Vb 1 )=0, (9) 
and the square root of the variance of this quantity is 



.(A) = 



+ 



15 



(10) 



Here, and throughout the rest of the paper, we use the no- 
tation E(Q) for the expected value of the statistic Q and 
cr(Q) to the signify the square root of the variance about 
this value. These values assume Q is applied to a Gaussian 
data-set. The same quantities for the kurtosis (Mardia 1980) 
are 



E(b 2 ) = 



and 



<r(b 2 ) 



3(n- 1) 
ra + l ' 



124 / 1^ 271 2319 \ 
~'\ ~ 27i + 8n* ~ I6n 3 + " ') ' 



(11) 



(12) 



So we have the expected value and variance of the kurto- 
sis and skewness. But how useful are these quantities? Are 
we justified in using Gaussian statistics to measure non- 
normality? As the value of n increases the distribution of 
E(s/bi) soon reverts to a normal distribution. However, the 
distribution of £'(62) is very skewed for n=100 and is hardly 
normal for n=1000 (Mardia 1980). Clearly large values of 
n are required to be confident of the any analysis using the 
kurtosis. This last point is addressed further in Section 4 
when our sample sizes are discussed. 

There are a number of general tests that can be used to 
monitor the shape of the distribution of a variate. Some of 
these try to combine the skewness and kurtosis coefficients 
into an omnibus test. Other statistics aim to probe differ- 
ent features of the shape of the parent distribution. Using 
these statistics tend to be problematic as they usually re- 
quire tabulated coefficients that are cumbersome and not 
available for large n. For this reason D'Agostino (1971) de- 
veloped an omnibus test of normality for large sample sizes 
based on order statistics. The test statistic is defined as 
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D = 



n 2 S 



n 

E(''-5< n + 1 >> 



(13) 



where x^, X( 2 ), • ■ •, £(n) are the n order statistics of the 
sample such that x^\) < x^) < ■ • ■ < £(n)- The expectation 
is 



2^' 



and the variance asymptotically is given by 
0.02998598 



(7(D) = 



(14) 



(15) 



So far, we have looked at descriptive measures of the 
normality of the marginal distribution. There are also tests 
based on transforming the data. Box & Cox (1964) proposed 
using a shifted-power transformation to assess normality 



((xi+0 A -l)/A \*0, Xj >-Z 
\og(xj + £) A = 0, Xj > -£ 



(16) 



The transformation aims to improve the normality of the 
variate x; A can be estimated by maximum likelihood and 
the null hypothesis, Ho: X = 1, tested by a likelihood ratio 
test. The shift parameter £ is included in the transforma- 
tion because it appears to respond to the heavy-tailedness 
of the data, whereas, A appears sensitive to the skewness 
(Gnanadesikan 1997). Specifically, it can be shown that the 
log-likelihood function is given by 



£ max (£, A) = ~ In a 2 + (A - 1) Info + 



(17) 



where & is the maximum likelihood estimate of the trans- 
formed distribution that is presumed to be normal such that 



n 

-2 1 ( 



ASA) 



3 = 1 



where 

x (t ' x) 



i n 

- V x (e ' A) 



(18) 



(19) 



3 = 1 



Equation 17 is maximised to obtain the maximum 
likelihood estimates £ and A. Finally, a significance 
test can be constructed by comparing the value of 
2(£ max (£, A) — £max(£> 1)) to a \ 2 distribution with one de- 
gree of freedom. Note that £ ma x(£, 1) is independent of £ so 
this is a free parameter in the test. 

3.2 Evaluating joint normality 

As we have already remarked, deviations from joint nor- 
mality should be detectable through methods directed to- 
ward testing the marginal normality of each variate. How- 
ever, there is a need to explicitly test the multivariate nature 
of the data. Therefore, we look at three techniques that fulfil 
this requirement. These techniques are multivariate gener- 
alizations of univariate tests outlined in Section 3.1. 

The first two techniques to evaluate joint normality are 
multivariate measures of skewness and kurtosis. These meth- 
ods were developed by Mardia (1970) and make use of Maha- 
lanobis distance of Xj and Mahalanobis angle between Xj — x 
and Xfc — x. The Mahalanobis distance is defined as 



2 
r 3 



(xj -x)'S -x) 



(20) 



where S is the sample covariance matrix. It is often used to 
assess the similarity of two or more data sets. The Maha- 
lanobis angle is given by 



r jk = (xj - x)'S 1 (x fc -x) 



(21) 



The multivariate skewness is related to the Mahalanobis an- 
gle and as such reflects the orientation of the data. It is 
defined as 



j = l k=l 



(22) 



where nbi p /6 is asymptotically distributed as a \ 2 with p(p+ 
l)(p + 2)/6 degrees of freedom. In the bivariate case, this is 
a xl distribution. The multivariate kurtosis is defined as the 
mean of the square of the Mahalanobis distance 



n 



(23) 



It is expected to be normally distributed with mean p(p + 2) 
and variance 8p(p + 2)/n. Therefore, we have £(622) = 8 
and (7(622) = \J 64/n. 

Our last multivariate technique is an extension of the 
transformation test. We simply look at just the power trans- 
formation of each variable separately ignoring the shift pa- 
rameter £. That is to say 



X; 



logfxi 



1) /Ai 



A, = 



The maximised log-likelihood function is given by 



c(A) 



-?ln|£| 



E 



(A, - 1) Inxij 



3=1 



(24) 



(25) 



where Y. is maximum likelihood estimate of the covariance 
matrix of the transformed data (Gnanadesikan 1997). T. 
is computed in analogous fashion to a 2 in Equation 17. 
The transformation with A = (Ai,...,A p )' = 1 is the 
only transformation consistent with normality. Therefore, 
as with the univariate case, we can compare the value of 
2(£ max (A) - £max(l)) to a \p distribution. 

3.3 Evaluating linearity 

There is a further aspect of multivariate normality that we 
can examine in our data set: all variates, if not indepen- 
dent of each other, should be linearly related. Here we are 
not specifically testing the normality of the variates, as it is 
easy to imagine variates that are extremely non-normal yet 
linearly related (eg. two identical sawtooth distributions). 
Importantly though, any non-linearity does inform us about 
the usefulness of the covariance matrix. Whereas, marginal 
non-normality may allow you to still extract useful infor- 
mation from the covariance matrix. Non-linearity results in 
the more serious consequence of the covariance matrix be- 
ing a poor indicator, even qualitatively, of the association 
of variates (Cox & Small 1978). Our prime motivation for 
assessing the multivariate Gaussianity of the WMAP data 
is due to statistical nature of the data underpinning the ex- 
traction of cosmological parameters. It is therefore essential 
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that the covariance matrix fully specifies all the information 
from the data set. Clearly, testing linearity is not only com- 
plementary to the previous tests we have outlined, but also 
can be thought of as a vital step in assessing the Gaussianity 
of CMB data. 

Investigating the linearity of regression of variates as a 
means of scrutinizing multivariate normality was first pro- 
posed by Cox & Small (1978). We shall outline a related 
method for obtaining and evaluating regression coefficients 
described in Montgomery (1997). To begin with, let us look 
at the general concepts behind multiple linear regression. If 
we have n measurements of two variables y and z, then, the 
two variables are related linearly when the following model 
fully describes their relationship 



A + AiZj + ej 



(26) 



where tj is the error term in the model and (as usual) the 
subscript j runs from 1 to n. The parameters A a are referred 
to as regression coefficients. We can consider building other 
terms into our model 



yj = A + AiZji + A 2 z j2 H + A q z jq + e 3 , 



(27) 



where Zj a can be Zj, Zjyj, z x J A and so on. Linearity dictates 
that the regression coefficients A a should be zero for all non- 
linear terms. So how do we estimate A a ? A typical technique 
is to minimize the error term tj. If we rewrite Equation (27) 
in matrix notation, we have 



ZA + e 



(28) 



where A = (Ao, • • • , A q )' and Z is a nx(q+l) matrix with the 
first column comprised of ones and the subsequent columns 
corresponding to k regressor variables in the model. The 
error term can be minimized using the method of least- 
squares. The least-squares function is given by 



L = ^=e'e=(y-ZA)'(y-ZA). 



(29) 



2 = 1 



Differentiating Equation 29 with respect to A and equating 
this to zero, minimizes this function and leads us to 

A=(Z'Z)- 1 Z'y, (30) 

which is our best estimate of A. 

Once we have obtained a set of regression coefficients, 
we wish to test the significance of each coefficient. If the 
variables are linearly related then the coefficients for the 
additional coefficients to our model should be zero. Given 
an individual coefficient A a , we therefore want to check the 
hypothesis Hq: A a — against H\\A a 7^ 0. If Ho is not 
rejected for one of the additional coefficients, then the term 
needs to be added to model the behaviour of the variables. 
Montgomery (1997) outlines a statistic to that can be used 
to test this hypothesis. The error in our estimate of A is 
given by 

* 2 = ^-( y ' y - A ' Z ' y )- (31) 
The test statistic for Ho is 



to = 



A n 



(32) 

where C aa is the diagonal element of (Z'Z) -1 corresponding 



to A a . The statistic to should match a i-distribution with 
(n — q — 1) degrees of freedom. 



4 DATA 

The WMAP instrument comprises 10 differencing assem- 
blies (consisting of two radiometers each) measuring over 5 
frequencies (~ 23, 33, 41, 61 and 94 GHz). The two lowest 
frequency bands (K and Ka) are primarily Galactic fore- 
ground monitors, while the highest three (Q, V and W) are 
primarily cosmological bands (Hinshaw et al. 2003). 

In our study, we look at a total of 15 maps con- 
structed from the WMAP data. All the maps were ob- 
tained from the NASA's LAMBDA"!" data archive. The 
maps are in HEALPix'f format with a resolution parame- 
ter of A r sid e=512. The data consists of: 5 frequency maps; 
8 'foreground-cleaned' assembly maps; and 2 CMB-only 
maps. We shall refer to these sets of maps as sets (a), (b) 
and (c), respectively. Set (a) is composed of five maps corre- 
sponding to each frequency bandpasses observed by WMAP. 
For each pixel in these maps, noise weighting is used to com- 
pute the average temperature from the individual differenc- 
ing assembly map values. Set (b) consists of 'foreground- 
cleaned' sky maps corresponding to each individual differ- 
encing assembly. The assemblies are labelled Ql, Q2, VI, 
V2, Wl, W2, W3 and WA, where the letter corresponds to 
the frequency band. These maps were used in the calculation 
of the angular power spectrum by the WMAP team (Hin- 
shaw et al. 2003). The Galactic foreground signal was re- 
moved using a 3-band, 5-parameter template fitting method 
described in Bennett et al (2003b). Set (c) corresponds to 
two CMB-only maps constructed by the WMAP team (Ben- 
nett et al. 2003b) and Tegmark, de Oliveira-Costa & Hamil- 
ton (2003) (see papers for details). We shall refer to these 
two maps as the ILC and TOH maps, respectively. Both 
maps were assembled in a manner that minimises foreground 
contamination and detector noise, leaving a pure CMB sig- 
nal. The ultimate goal of these two maps is to build an 
accurate image of the last-scattering-surface that captures 
the detailed morphology. 

For CMB analyses, it is necessary to mask out regions 
of strong foreground emission. Bennett et al. 2003b pro- 
vide masks for excluding regions where the contamination 
level is large. The masks are based on the K-band measure- 
ments, where contamination is most severe. The severity of 
the mask is a compromise between eliminating foregrounds 
and maximising sky area in analyses. In our analysis, we 
concentrate our statistics on maps with the kp2 mask ap- 
plied. This results in 15.0% of pixels being cut. When set 
(b) is combined with the kp2 mask, we replicate the data 
used in the calculation of the angular power spectrum. The 
Galactic sky cut is unnecessary when looking at set (c) as 
the whole sky should be uncontaminated, however, we still 
apply the cut for consistency and simplicity. We also occa- 
sionally refer to results with the application of the kpO mask 
which excludes 23.0% of the sky. 



t http://lambda.gsfc.nasa.gov 
t http://healpix.jpl.nasa.gov 
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5 IMPLEMENTATION 

In order to keep our study as focussed as possible, our initial 
multivariate test is bivariate: simply involves looking at pairs 
of variates extracted from the maps. This allows us to use 
the same methods on both real and harmonic space quan- 
tities as we shall explain later in this section. The study of 
bivariates has the additional benefit of reducing the compu- 
tational requirements. However, we should emphasise that 
all the methods outlined in Section 3 are applicable to larger 
values of p. For a pair of variates (xi, X2), we assess the nor- 
mality of the bivariate probability distribution - P(xi,X2). 
The probability distribution should have the form outlined 
in Equation 2. In real space, we look at temperature pairs 
(ATi, AT2). This allows us to assess the 2-point correlation 
function if we extract pixel pairs across a range of scales. We 
randomly select 1 million temperature pairs from a given 
map. As there are roughly 3 million pixels, it means that 
there are ~ 10 13 combinations of pairs. We found that a 
million pairs were sufficient to replicate the overall distribu- 
tion. We then calculate the angular separation of each pair. 
The pairs are then grouped into 100 bins according to their 
separation such that the bin sizes are roughly equal. Thus, 
we are left with 100 bins with n~ 10, 000. In harmonic space, 
we study the distribution of the the real 5ft and imaginary $j 
parts of the spherical harmonic coefficients. This idea could 
be extended to different mode functions. Spherical harmonic 
modes are independent for a Gaussian random field defined 
on a sphere. The effect of a cut or other mask is to correlate 
these models, but only through the introduction of a linear 
covariance. This means that this method could be used to 
study any combination of modes even on a cut sky. 

One of the other advantages of looking in harmonic 
space is that it is trivial to probe differing scale lengths. 
We simply study the distribution of (5ft, 3) for a given value 
of I. Consequently, we have bins with n = £. We probe scales 
up to and including ^ = 600, however, due to the small value 
of n at low I we ignore the results for £ < 100. However, 
the effect of using masks on sets (a) and (b) means that the 
harmonic coefficients cannot easily be calculated. Thus, we 
only study harmonic space for set (c) where the whole-sky 
can be appropriately studied. 

We now outline how the various statistics were com- 
puted for bivariate data. The skewness, kurtosis and 
D'Agostino's statistics, applied to the marginal distribu- 
tions, are trivial to calculate. As we have already mentioned, 
there is an issue with distribution of the expected kurtosis 
£(62) being non-Gaussian for small n. This is not an issue 
in real space where n is very large, but, it is worth bear- 
ing in mind in harmonic space. We could address this issue 
by modelling the distribution of E(b 2 ) using Monte Carlo 
simulations. However, as this is only an issue for a small 
part of our analysis and require significantly more compu- 
tation we chose not to adopt this method. The univariate 
shifted-power transformation method is more challenging 
to implement. The method requires a careful choice of min- 
imiser/maximiser. The majority of minimizers call for the 
derivative of the function being minimized. The derivative 
of Equation (17) is computationally taxing. Therefore, we 
choose to use a 'simplex downhill' minimizer (Press et al. 
1992) that requires only the actual function. The simplex 
minimizer is applicable to multidimensional functions which 



fulfils our need for minimizing a 2-dimensional function. 
Most minimizers do not find the global minimum, instead 
they get 'trapped' in local minima. We allow for this by 
testing our code with various initial starting points along 
the function and varying the step size. Nevertheless, there 
is still the possibility that we may fail to find the true min- 
imum, especially if the shape of the function is complex. 
Indeed, evidence is found that we fail to obtain the global 
minimum in some of our results (see later) . However, we our 
not unduly worried by this since a local minimum will result 
in smaller value of 2(£ max (£, A) — £ max (£, 1)) than that cal- 
culated from the true minimum. Thus, we will fail to detect 
non-normality rather than falsely claim non-normality. 

The bivariate skewness and kurtosis arc simple to cal- 
culate. However, the bivariate power transformation method 
has the same challenges as in the univariate case. Again, wc 
are trying to minimize a 2-dimensional function and so we 
turn to the simplex minimizer. Obviously, by using this min- 
imizer we inherit the same problems as for the univariate 
transformation. 

Finally, we look at the linearity of the temperature pairs 
and (5ft, Sy) pair. We add separately three non-linear terms 
to our model of the data: 

x 2j = A Q + A-ix-ij + A 2 x\ j 

x 2j = A + Aixij + A 2 x% (33) 
x 2 j = A Q + A-ix-ij + A 2 /xij 

where, as before, Xij are elements from the variate x;. Our 
task is now simply to assess the significance of the coefficient 
A 2 for the three non-linear terms using Equation (32) . 

6 RESULTS 

In this section, we present the results of our analysis on the 
WMAP-derived data. Throughout the section, the expec- 
tation value of a statistic E(Q), if non-zero, is marked in 
the figures as a straight line. The dark grey areas in the 
plots represent the 95% confidence regions. For example, if 
the distribution of E(Q) is (or approximates for large n) 
a normal distribution, then the dark grey region signifies 
■E(Q)±1.95996ct. The light grey region is the 99% confidence 
region. In section 6.1, we display and discuss the results of 
applying our statistics to temperature pairs from the assem- 
bley maps, the frequency maps, the CMB-only maps and a 
sample Gaussian Monte Carlo (MC) map. The latter map is 
used as a reference guide and to reassure us that the compu- 
tation of a statistic is accurate. For each map, the statistics 
were applied to 100 binned distributions and therefore we 
expect one value (for each statistic) to be outside the 99% 
confidence region. In section 6.2, we discuss the same analy- 
sis on the spherical harmonic coefficients obtained from the 
CMB-only maps and the same sample Gaussian MC map. 
For each map, the statistics are applied to 501 distributions 
(I = 100 — 600) and as such we expect 5 points beyond the 
99% confidence region. 

6.1 Real Space 

6.1.1 Marginal normality results 

The results of the application of the univariate methods on 
the temperature pairs are displayed in Figures 1-8. We dis- 
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Figure 1. Skewness results from the foreground— cleaned assem- 
bly maps. (Top left) Ql assembly map, (top right) Q2 assembly 
map, (2nd from top, left) VI assembly map, (2nd from top, right) 
V2 assembly map, (3rd from top, left) Wl assembly map, (3rd 
from top, right) W2 assembly map, (bottom, left) W3 assembly 
map, and (bottom, right) W4 assembly map. 

play the results for ATi to increase clarity, although unsur- 
prising, the corresponding plots for ATb resemble those for 
ATi. 

The skewness results are shown in Figures 1 and 2. The 
assembly data shows no signs of being skewed. However, 
the two non-cosmological frequencies (K and Ka) are very 
skewed. Clearly, the maps are heavily contaminated with 
Galactic foregrounds even outside the kp2 cut. Moreover, 
the construction of the kp2 mask means that the shape of 
the distribution of K has an artificial cut-off at high AT. 
The skewness coefficients appear uniform across all angu- 
lar separations. Looking at the results from the cosmologi- 
cal frequencies, the Q band frequency appears slightly pos- 
itively skewed but the two higher frequency display no sign 
of non-normality. Interestingly, when we look at the CMB- 
only maps, the TOH map appears negatively skewed- there 
are roughly 15 points outside the 99% confidence region. 
This is the opposite sign to the two non-cosmological bands 
which may suggest that the foreground-removal process has 
resulted in an over-subtraction. The ILC map also appears 
to be slightly negatively skewed. 

The analyses of the kurtosis coefficient are shown in 
Figures 3 and 4. The kurtosis is higher than expected in 
all 8 assembly maps and across all angular separations. The 
kurtosis is even greater in the non-cosmological frequency 
maps suggesting that not all the foreground information has 
been removed from the assembly maps. The two CMB-only 
maps also hint at this but with less certainty. Both maps 




Angular Separation 



Figure 2. Skewness results from the 5 frequency maps, the 2 
CMB-only maps and a Gaussian MC map. (Top left) K band 
map, (top right) Ka band map, (2nd from top, left) Q band (2nd 
top, right) V band map, (3rd from top, left) W band map, (3rd 
from top, right) ILC CMB-only map, (bottom, left) TOH CMB- 
only map, and (bottom, right) a Gaussian MC map. 

have a slightly higher than expected kurtosis especially when 
the two are compared to results from the Gaussian MC map. 
That is to say, their distributions are too peaked, and as with 
the skewness results suggests errors in foreground modelling. 

D'Agostino's test statistic is shown in Figures 5 and 6. 
The statistic is an omnibus statistic so it should be unsur- 
prising that it also pick up evidence of non-normality in 
all the WMAP-derived maps. Again, the results from the 
assembly maps have values mainly outside the 99% confi- 
dence region. This shift away from the expected value is even 
greater for the K and Ka bands. The three cosmological fre- 
quency maps produce results that appear non-normal but 
with much less significance than the Galactic bands. The two 
CMB-only maps also appear non-normal. At this point, wc 
note that both the kurtosis and D'Agostino's statistic have a 
greater shift away from their expected values for the Galac- 
tic frequencies for the very large and very small angles. This 
feature is seen later on in the other statistics we employ. 

The results of the shifted-power transformation of the 
data sets are shown in Figures 7 and 8. This part of our anal- 
ysis was the most computationally challenging aspect, there- 
fore, we are pleased to see that the result from the Gaussian 
MC map is as expected. Due to the challenge of this part of 
work we really look for severe departure from non-normality 
and keep our discussion brief. The assembly maps appear to 
have slightly more than expected points beyond the 99% 
confidence region. Moreover, the two non-cosmological fre- 
quency maps show clear signs of non-normality. The two 
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Figure 3. Kurtosis results from the foreground-cleaned assembly 
maps. 
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5. D'Agostino's statistic results from the foreground- 
assembly maps. 
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Figure 4. Kurtosis results from the 5 frequency maps, the 2 
CMB-only maps and a Gaussian MC map. 
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Figure 6. D'Agostino's statistic results from the 5 frequency 
maps, the 2 CMB-only maps and a Gaussian MC map. 
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Figure 7. Univariate transformation results from the 
foreground-cleaned assembly maps. 
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Figure 8. Univariate transformation results from the 5 frequency 
maps, the 2 CMB— only maps and a Gaussian MC map. 



CMB-only maps also show signs of non-normality- the TOH 
map result appearing to have departed furthest from nor- 
mality. 



6.1.2 Joint normality results 

Our bivariate analysis results are shown in Figures 9-14. The 
bivariate skewness statistics (n&i2 /6) calculated from the 16 
maps are shown in Figures 9 and 10. The statistic appears to 
be abnormal for three of the assembly maps- Ql, W2 and 
W4. Some of the other assembly maps have two or three 
points outside the 99% confidence region but visually the 
results do not look too unusual. Once again, the two non- 
cosmological frequencies have results that strongly indicate 
non-normality. This non-normality is still evident in the Q 
and V band. The two CMB maps also have a higher than 
expected number of points above the 99% confidence region. 
It would appear that the bivariate skewness results mimic 
their univariate counterparts. 

The bivariate kurtosis results are displayed in Figures 
11 and 12. As with the skewness results, bivariate kurtosis 
seem similar to their univariate equivalent. Nevertheless, in 
the case of the assembly maps, the shift away from the ex- 
pected value is even greater than for the univariate results. 
As before, the Galactic frequency maps are clearly found to 
be non-normal. The higher value than expected value of the 
bivariate kurtosis persists in the foreground-cleaned CMB 
maps 

The bivariate power transformation is shown in Figures 
13 and 14. The assembly results do not look entirely consis- 
tent with being drawn from a xl distribution. Five of the 
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Figure 9. Bivariate skewness results from the foreground- 
cleaned assembly maps. 
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Figure 10. Bivariatc skcwncss results from the 5 frequency maps, 
the 2 CMB— only maps and a Gaussian MC map. 
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Figure 12. Bivariatc kurtosis results from the 5 frequency maps, 
the 2 CMB— only maps and a Gaussian MC map. 
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Figure 11. Bivariatc kurtosis results from the foreground- 
cleaned assembly maps. 



assembly maps produce results with 5 or more points out- 
side the 99% confidence region. Saying that, our Gaussian 
MC map has four points outside this region, which makes 
it hard to draw definite conclusions. This is certainly not 
true for the two Galactic frequency bands that are clearly 
inconsistent with normality. The two CMB-only maps also 
appear to have an extremely high number of points beyond 
the 99% confidence region. 



6.1.3 Linearity results 

Lastly, in this subsection, we assess the linearity of the data. 
We tried adding separately three non-linear terms to our 
linear model of the data (as described in 5). However, all 
three terms were found to be unnecessary descriptors for 
the data from all of the maps bar the heavily contaminated 
K band map. This is reassuring as it tells us that the angular 
power spectrum supplies reliable information about the data 
sets. We plot in Figures 15 and 16 the results for addition 
of the z| non-linear term. Curiously, the non-linearity seen 
in the K band map is seen at the largest and very smallest 
angular separations. As has already been noted, this trend 
is seen in other statistics that we employ. 



6.2 Harmonic space 

6.2.1 Marginal normality results 

The results of applying the skewness, kurtosis and 
D'Agostino's statistics are shown in Figures 17, 18 and 19, 
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Figure 13. Bivariate transformation results from the foreground- 
cleaned assembly maps.. 
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Figure 15. Linear regression results from the foreground-cleaned 
assembly maps. 
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Figure 14. Bivariate transformation results from the 5 frequency 
maps, the 2 CMB-only maps and a Gaussian MC map. 




0° 45 



90° 135 



0° 45° 90° 135° 180° 
Angular Separation 



Figure 16. Linear regression results from the 5 frequency maps, 
the 2 CMB-only maps and a Gaussian MC map. 
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Figure 17. Skcwncss results from the a^ m of the 2 CMB-only 
maps and a Gaussian MC map. {Top left) ILC CMB-only map, 
{top right) TOH CMB-only map and {bottom) a Gaussian MC 
map 
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Figure 18. Kurtosis results from the a,£ m of the 2 CMB-only 
maps and a Gaussian MC map. 

respectively. The results from the ILC map appear consis- 
tent with normality. However, all three statistic behave un- 
usually for the TOH map on scales smaller than £ ~ 400,. 
This is particularly true of the kurtosis coefficient where the 
non-normality is most evident. It would be interesting to 
relate this non-normality to that already seen in real space. 
This could hopefully give us a better handle on the source 
of the non-Gaussianity (whether Galactic or cosmological) . 
The univariate transformation results are shown for com- 
pleteness in Figure 20. However, the method appears unre- 
liable as the Gaussian MC map results are inconsistent with 
being drawn from a Xi distributions. We feel this unreliabil- 
ity is due to the small values of n that make the minimized 
function shape more complex. 

6.2.2 Joint normality results 

The application of our bivariate skewness and kurtosis 
statistics are shown in Figures 21 and 22. Once again, the 
ILC map behaviour corresponds to that of the Gaussian 
MC map. However, the TOH map shows clear signs of non- 
normality. The non-normality is displayed from larger scales 
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Figure 19. D'Agostino's statistic results from the ai m of the 2 
CMB-only maps and a Gaussian MC map. 
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Figure 20. Univariate transformation results from the a^ m of 
the 2 CMB-only maps and a Gaussian MC map. 

than the univariate counterparts. That is to say, the distri- 
bution of the a tm s appears unusual at scales greater than 
I = 300. The results from the multivariate power transfor- 
mations using harmonic space data are shown in Figure 23. 
Looking at the results from the ILC and Gaussian MC maps, 
it appears that we are failing to find the true global min- 
imum. As discussed earlier, finding a local minimum will 
result in an underestimate of the value of \\- Therefore, 
we are not too concerned about this as failure to find global 
minima since it failure does not result in false claims of non- 
Gaussianity. Intriguingly, the shape of the line for the TOH 
map does not match those of the other two maps. If we are 
failing to find the global minima then the we would expect 
a larger number of points to be beyond the 99% confidence 
region if we corrected this failure. Equally, the result may 
reflect that the distributions extracted from the TOH maps 
make finding the global minimum easier because they are, 
to some extent, smoother. 

6.2.3 Linearity results 

In our analysis of the linearity of the harmonic space vari- 
ates, we do not find any evidence of non-linearity. This was 
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Figure 22. Bivariate kurtosis results from the ai m of the 2 CMB- 
only maps and a Gaussian MC map. 




200 4-00 
Scale (I) 



600 



Figure 23. Bivariate transformation results from the ci£ m of the 
2 CMB-only maps and a Gaussian MC map. 



also the case when we looked at the real space temperature 
pairs. To illustrate this, we display in Figure 24 the results 
from the addition of the z\ non-linear term . 



6.3 Other results 

In light of the positive detections of non-normality, it is 
obviously important to gain some insight into its cause. In 
the Introduction, we stated that the distributions resulting 
from the various sources of non-Gaussianity are poorly un- 
derstood. Nevertheless, we can reduce the influence of pos- 
sible contaminants. In particular, by applying a more severe 
mask to the data we should be able to observe regions of 
the CMB-sky where there is less Galactic contamination. 
We looked at the assembly data with the more severe kpO 
mask applied. Our techniques produce identical results to 
those obtained from the kp2 mask: the skewness is consis- 
tent with normality; the kurtosis is inconsistent (~ 3.2); 
D'Agostino's statistic is inconsistent (ranging from 0.280 to 
0.281); and the bivariate kurtosis is inconsistent (~ 8.5). 
This suggests that Galactic effects may not be causing the 
non-normality we measure. However, we should be wary of 
jumping to the conclusion that the signal is cosmological in 
origin. Over subtraction, residual inhomogeneous noise, or 
other systematic effects may be the problem. 

Another pertinent question is whether the detected 
non-normality is associated with previous claims of non- 
Gaussianity. Eriksen et al. (2004) find an asymmetry be- 
tween the northern and southern Galactic hemispheres, with 
the northern portion appearing devoid of large-scale struc- 
ture. Our techniques in real space allow us to localize re- 
gions of the sky. We apply our statistic to the northern 
and southern parts of the Wl assembly map. We find that 
the kurtosis, D'Agostino's and bivariate kurtosis statistics 
show identical signs of non-normality on both hemispheres. 
This suggests that the non-normality is symmetric about 
the Galactic plane. Moreover, this also rules out the non- 
normality being associated with a localised 'cold-spot' as 
detected by wavelet techniques (McEwen et al. 2005; Vielva 
et al. 2004). 
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7 CONCLUSION 

In this paper, we have outlined a series of statistics that can 
be used to assess the multivariate Gaussian nature of CMB 
data. The extraction of cosmological parameters from this 
data relies on it being jointly normal. The statistics we de- 
scribe test differing aspects of joint-normality. The first four 
statistics assess the normality of marginal distributions us- 
ing familiar univariate methods. We then utilise three statis- 
tics that directly assess joint -normality. Finally, we look for 
evidence of non-linearity in the relationship between vari- 
ates. We applied these tests to bivariates extracted from 
maps derived from the WMAP 1st year data. The maps con- 
sisted of 5 frequency maps, 8 'foreground-cleaned' assembly 
maps and 2 CMB-only maps. The maps were assessed with 
the kp2 mask applied. The bivariates extracted were temper- 
ature pairs (ATi, AT?) and the real and imaginary parts of 
the spherical harmonic coefficients ae m . Although, the latter 
was only assessed for the two CMB-only maps. 

Significant departures from normality were found in all 
the maps in both real and harmonic spaces. In particular, 
temperature pairs extracted from all 8 assembly maps were 
found to be inconsistent with joint-normality. These maps 
are used to calculate the angular power spectrum and subse- 
quently deduce cosmological parameters. Marginal distribu- 
tions were found to have values of kurtosis and D'Agostino's 
statistic outside the 99% confidence regions. Temperatures 
pairs from the same maps were also found to have values of 
bivariate kurtosis outside the 99% confidence region. These 
departures were found at all angular separations. In trying 
to ascertain the origin of this non-normality, we found that 
the results were unaffected by the size of the Galactic cut 
and were evident on either hemisphere of the CMB-sky. This 
last aspect rules out the non-Gaussianity being related to 
previous claims of north-south asymmetry or 'cold-spots' 
detected by wavelet techniques, although residual systemat- 
ics from the map-making process remain a likely possibility 
for the origin of the signal. 

The transformation techniques for assessing normality 
described in this paper are quite challenging to implement. 
In future work, we hope to improve our method such that 
greater confidence can be placed on the results. One of the 
benefits of the transformation techniques is that they pro- 
vide a natural solution to how to modify the data such that 
it is Gaussian. The positive detections of non-normality seen 
in bivariate data, should make it worthwhile to assess quan- 
tities with larger values of p. This will enable us to build 
a broader picture of the shape of the distribution. Such 
knowledge should be supplemented with advances in the 
understanding of distributions from known sources of non- 
Gaussianity (whether cosmic or Galactic). The techniques 
outlined can also be used to study other quantities derived 
from the data that should conform to joint-normality. For 
example, the techniques could be applied to coefficients from 
wavelet or multipole vector analyses. Finally, we note that 
the techniques can be incorporated into methods for sub- 
tracting sources of non-Gaussianity. Firm requirements on 
the final data, such that they satisfy these tests, will result 
in cleaner data-sets. 

We stress that the currently- available data sets are pre- 
liminary; the foreground subtraction for the final CMB-only 
maps is not perfect, and there may well be residual system- 



atics in the instrument noise. Just as the data set is prelim- 
inary so is this analysis. We look forward to further releases 
in order to establish whether the non-Gaussianity that we 
have detected can be entirely explained by such artifacts. 
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